Skip to content

Conversation

@xuxiong1
Copy link
Contributor

@xuxiong1 xuxiong1 commented Dec 3, 2025

Description

Added support for handling non-segment files (custom metadata) in subdirectory-aware store recovery.

  • Renamed loadSubdirectoryMetadataFromSegments() to loadSubdirectoryMetadata()
  • Added computeFileMetadata() to calculate metadata (checksum, length) for non-segment files

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • New Features

    • Recover and account for custom non-segment metadata files stored alongside index segments.
  • Bug Fixes

    • Recovery now validates that both segment and non-segment metadata files match across nodes to avoid inconsistent recoveries.
  • Tests

    • Added test coverage validating loading, checksums, and mixed segment/non-segment recovery scenarios.
  • Chores

    • Changelog updated to note handling of custom metadata files in subdirectory store.

✏️ Tip: You can customize this high-level summary in your review settings.

@xuxiong1 xuxiong1 requested a review from a team as a code owner December 3, 2025 00:58
@coderabbitai
Copy link

coderabbitai bot commented Dec 3, 2025

Walkthrough

Loads and validates metadata for both segment files and custom non-segment files (prefixed test_metadata_) in subdirectory stores; computes length, checksum, and Lucene version for non-segment files, updates metadata aggregation, and adds tests that create and require consistent subdirectory file sets across nodes.

Changes

Cohort / File(s) Change Summary
Subdirectory-aware store logic
modules/store-subdirectory/src/main/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareStore.java
Replaces segment-only loader with loadSubdirectoryMetadata handling both segments_* and non-segment files; adds helper to compute metadata (length, checksum, writtenBy/version) for individual non-segment files; integrates non-segment files into metadata aggregation; adds CodecUtil/Version imports.
Recovery tests (internalClusterTest)
modules/store-subdirectory/src/internalClusterTest/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareRecoveryTests.java
Builds per-node file sets that include both segment files and non-segment files (test_metadata_*); creates non-segment Lucene-compatible files during flush/commit; extends TestIndexCommit and wires unique-name counter and prefix; adds imports to support file creation (CodecUtil, IOContext, IndexOutput).
Plugin unit tests / test scaffolding
modules/store-subdirectory/src/test/java/org/opensearch/plugin/store/subdirectory/SubdirectoryStorePluginTests.java
Adds test helpers (createTestStoreSetup, writeTestFileWithCodec), a TestIndexCommit that aggregates main-directory and subdirectory files and filters custom metadata files, and multiple tests verifying metadata loading, checksums, and mixed segment/non-segment handling; adds Lucene IO imports and checksum validation logic.
Changelog
CHANGELOG.md
Adds Unreleased 3.x entry: "Handle custom metadata files in subdirectory-store (#20157)".

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Review metadata computation (checksum/version) in SubdirectoryAwareStore for correctness and edge cases.
  • Verify non-segment file creation code (headers/footers, IOContext usage) in tests and recovery path to avoid flakiness.
  • Inspect TestIndexCommit implementations and file-list aggregation to ensure no duplicate/missing entries across main and subdirectory files.

Poem

🐰 I nudged the bytes and hummed a tune,
Between segments and meta under the moon,
Checksums danced and files aligned,
Nodes compared what they could find,
🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.93% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive The PR description includes a brief summary of changes and checks the testing box, but lacks a 'Related Issues' section and provides minimal detail about what was achieved compared to the template structure. Add a 'Related Issues' section referencing issue #20157 and expand the description to explain the motivation and impact of the changes in more detail.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Handle custom metadata files in subdirectory-store' clearly and concisely summarizes the main change: adding support for non-segment (custom metadata) files in subdirectory-aware store recovery.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
modules/store-subdirectory/src/main/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareStore.java (2)

140-154: Consider more precise segment file identification.

The check fileName.contains(IndexFileNames.SEGMENTS) could potentially match non-segment files that happen to contain "segments" in their name. Consider using a more precise check:

-                if (fileName.contains(IndexFileNames.SEGMENTS)) {
+                String localFileName = filePath.getFileName().toString();
+                if (localFileName.startsWith(IndexFileNames.SEGMENTS)) {

This ensures only actual segment files (prefixed with "segments_") are matched, rather than any file containing "segments" anywhere in the path.


247-247: Consider documenting the version selection rationale.

Using minimumIndexCompatibilityVersion().luceneVersion is a reasonable fallback for non-segment files where the actual writer version is unknown. Consider adding a brief comment explaining this choice for future maintainability.

+                // Use minimum compatible version as fallback since non-segment files don't store writer version
                 Version version = org.opensearch.Version.CURRENT.minimumIndexCompatibilityVersion().luceneVersion;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b6d266 and 65c8087.

📒 Files selected for processing (2)
  • modules/store-subdirectory/src/internalClusterTest/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareRecoveryTests.java (7 hunks)
  • modules/store-subdirectory/src/main/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareStore.java (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: detect-breaking-change
  • GitHub Check: Analyze (java)
🔇 Additional comments (9)
modules/store-subdirectory/src/internalClusterTest/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareRecoveryTests.java (6)

11-22: LGTM!

The new imports for CodecUtil, IOContext, and IndexOutput are necessary for the non-segment file creation functionality.


116-133: LGTM!

The verification logic correctly aggregates both segment files (from SegmentInfos) and non-segment files (by prefix matching), ensuring comprehensive recovery validation.


146-157: LGTM!

The additional verification ensures non-segment files are present in the subdirectory after recovery, which is essential for validating the new functionality.


205-211: LGTM!

The constant and counter are well-defined. The counter is an instance field, which is appropriate for this test engine context where each shard has its own engine instance.


259-273: Well-structured non-segment file creation.

The method properly creates a Lucene-compatible file with CodecUtil.writeHeader and CodecUtil.writeFooter, ensuring the checksum can be validated during recovery. The use of try-with-resources ensures proper resource cleanup.


343-351: LGTM!

The modification to include non-segment files in getFileNames() is consistent with the verification logic and ensures these files are properly included in the recovery process.

modules/store-subdirectory/src/main/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareStore.java (3)

13-23: LGTM!

The new imports for CodecUtil and Version are necessary for the checksum computation and version handling in the new computeFileMetadata method.


127-128: LGTM!

The updated comment and method call correctly reflect the expanded scope to include non-segment files.


158-167: LGTM!

The processing logic correctly handles both file types:

  • Segment files are processed via loadMetadataFromSubdirectorySegmentsFile which aggregates all referenced segment files
  • Non-segment files are only processed if not already present in the builder, avoiding duplicate work

@xuxiong1
Copy link
Contributor Author

xuxiong1 commented Dec 3, 2025

@coderabbitai generate docstrings

@coderabbitai
Copy link

coderabbitai bot commented Dec 3, 2025

✅ Actions performed

Initiated docstring generation; will generate only if new commits exist.

@coderabbitai
Copy link

coderabbitai bot commented Dec 3, 2025

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #20158

coderabbitai bot added a commit that referenced this pull request Dec 3, 2025
Docstrings generation was requested by @xuxiong1.

* #20157 (comment)

The following files were modified:

* `modules/store-subdirectory/src/main/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareStore.java`
@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

❌ Gradle check result for 65c8087: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: xuxiong1 <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
modules/store-subdirectory/src/internalClusterTest/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareRecoveryTests.java (1)

342-350: LGTM!

The addition correctly includes non-segment files in the IndexCommit.getFileNames() result, ensuring these files are included during recovery. The logic appropriately mirrors the verification method for consistency.

Optional: The non-segment file detection logic (filtering by prefix) is duplicated here and in verifySubdirectoryFilesOnAllNodes. Consider extracting a helper method like isNonSegmentFile(String fileName) to centralize this logic, though for test code the current approach is acceptable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65c8087 and bfd6b6f.

📒 Files selected for processing (2)
  • CHANGELOG.md (1 hunks)
  • modules/store-subdirectory/src/internalClusterTest/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareRecoveryTests.java (7 hunks)
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: detect-breaking-change
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: Analyze (java)
🔇 Additional comments (5)
modules/store-subdirectory/src/internalClusterTest/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareRecoveryTests.java (5)

11-22: LGTM!

The new imports are necessary for creating Lucene-compatible non-segment files with proper codec headers and footers.


116-133: LGTM!

The verification logic correctly aggregates both segment files (via SegmentInfos) and non-segment files (via prefix filtering) to ensure complete file coverage during recovery verification.


146-156: LGTM!

Good addition of the assertion to verify non-segment files are actually present. This ensures the test validates the intended functionality rather than silently passing when no non-segment files exist.


204-210: LGTM!

The constant for non-segment file prefix and the instance-level counter for generating unique file names are appropriately defined.


251-272: LGTM!

The createNonSegmentFile method correctly creates Lucene-compatible files with proper codec headers and footers. This ensures the non-segment files have valid checksums that can be verified during recovery, which aligns with the PR's goal of testing custom metadata file handling.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

✅ Gradle check result for bfd6b6f: SUCCESS

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 95.65217% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.19%. Comparing base (da18cc6) to head (42c9c06).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...gin/store/subdirectory/SubdirectoryAwareStore.java 95.65% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20157      +/-   ##
============================================
- Coverage     73.25%   73.19%   -0.07%     
- Complexity    71684    71706      +22     
============================================
  Files          5788     5793       +5     
  Lines        327866   328073     +207     
  Branches      47218    47248      +30     
============================================
- Hits         240194   240134      -60     
- Misses        68399    68599     +200     
- Partials      19273    19340      +67     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: xuxiong1 <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
modules/store-subdirectory/src/test/java/org/opensearch/plugin/store/subdirectory/SubdirectoryStorePluginTests.java (5)

150-187: TestStoreSetup/createTestStoreSetup look sound, but consider reusing in existing tests

The holder and factory encapsulate shard/store setup cleanly and mirror the existing testStats setup; construction of SubdirectoryAwareStore and paths looks consistent. You might optionally refactor testStats to use createTestStoreSetup() as well to avoid duplication and keep test configuration changes in one place.


241-292: Mixed files test logic is good; re-use subdirectory path helper to avoid drift

The mixed segment/non‑segment scenario and duplicate‑detection assertions are solid and should catch regressions in how getMetadata merges file lists.

Same concern as above about setup.path.resolve("subdir"): it would be safer to compute this path via a shared helper (or a constant matching the production subdirectory name and relative root) so both tests and implementation refer to the same physical location.


294-349: Checksum validation test is strong; consider asserting presence of both files explicitly

This test nicely validates that the checksum produced by the store metadata matches a direct CodecUtil.checksumEntireFile computation and that different content yields different checksums.

Two small suggestions:

  • Add explicit assertTrue(metadata.contains("subdir/checksum_test.dat")) / contains("subdir/checksum_test2.dat") before reading from metadata2 to get clearer failure messages if the files are accidentally filtered out.
  • As with the other tests, consider factoring out subdirectory path computation to ensure it stays aligned with the production subdirectory location.

351-361: writeTestFileWithCodec helper is correct; optional deterministic timestamp

The helper correctly writes a Lucene‑style header/footer and some payload, which is appropriate for files that will be checksummed via Lucene APIs.

If you ever need bit‑for‑bit deterministic test data (e.g., across JVMs or runs), you could replace System.currentTimeMillis() with a fixed value. For the current use (only comparing different files), the current approach is fine.


363-460: TestIndexCommit implementation is generally correct; consider deduplication and small robustness tweaks

The custom TestIndexCommit correctly:

  • Delegates segment metadata to a captured SegmentInfos.
  • Aggregates main‑directory segment files plus any subdirectory segment files (when a commit exists).
  • Adds non‑segment “custom metadata” files while filtering out Lucene internals and test infrastructure via isCustomMetadataFile.

A few optional improvements:

  1. Avoid shadowing directory: inside getFileNames, the local Directory directory = FSDirectory.open(subdirPath) shadows the field directory. Renaming the local (e.g., subdirDirectory) would improve readability.

  2. Return a Set to enforce uniqueness: using a Set<String> for allFiles would guarantee uniqueness at construction time rather than relying on downstream distinct() checks.

  3. Code reuse with existing TestIndexCommit: there’s a very similar TestIndexCommit in SubdirectoryAwareRecoveryTests. If feasible, consider sharing a common helper (e.g., in a test utility) to avoid divergence in how file lists are composed and filtered.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bfd6b6f and 42c9c06.

📒 Files selected for processing (1)
  • modules/store-subdirectory/src/test/java/org/opensearch/plugin/store/subdirectory/SubdirectoryStorePluginTests.java (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
modules/store-subdirectory/src/test/java/org/opensearch/plugin/store/subdirectory/SubdirectoryStorePluginTests.java (2)
libs/common/src/main/java/org/opensearch/common/util/io/IOUtils.java (1)
  • IOUtils (58-317)
modules/store-subdirectory/src/internalClusterTest/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareRecoveryTests.java (1)
  • TestIndexCommit (314-385)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: Analyze (java)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (21, ubuntu-latest)

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

✅ Gradle check result for 42c9c06: SUCCESS

Copy link
Contributor

@yupeng9 yupeng9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good

try (IndexInput in = dir.openInput(localFileName, IOContext.READONCE)) {
long length = in.length();
String checksum = Store.digestToString(CodecUtil.checksumEntireFile(in));
Version version = org.opensearch.Version.CURRENT.minimumIndexCompatibilityVersion().luceneVersion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for non-segment info file, we also need lucene version as placeholder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you check the constructor of StoreFileMetadata.java, the version is mandatory.

this.writtenBy = Objects.requireNonNull(writtenBy, "writtenBy must not be null");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants